[WIP] Address feedback on ROCM-20519: replace racy pre-sync host read with hipStreamQuery#4256
Draft
Copilot wants to merge 2 commits intoROCM-20519from
Draft
[WIP] Address feedback on ROCM-20519: replace racy pre-sync host read with hipStreamQuery#4256Copilot wants to merge 2 commits intoROCM-20519from
Copilot wants to merge 2 commits intoROCM-20519from
Conversation
1 task
Co-authored-by: jaydeeppatel1111 <106300970+jaydeeppatel1111@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] [WIP] Address feedback on ROCM-20519 for DeviceSynchronize race stabilization
[WIP] Address feedback on ROCM-20519: replace racy pre-sync host read with hipStreamQuery
Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unit_hipDeviceSynchronize_Functionalused a host-memory read to assert the device was still busy before callinghipDeviceSynchronize(). That read raced with the in-flight D2HhipMemcpyAsyncwriting to the same pinned buffer — correctness depended entirely on the copy being slow enough to not finish first.Motivation
Eliminate the data race in the pre-sync assertion of
Unit_hipDeviceSynchronize_Functional. Reading shared host memory while a D2H async copy is in flight is undefined behavior; it was only masked by inflating the copy size viaHIP_TEST_DEVICE_SYNCHRONIZE_FUNCTIONAL_COPY_MB, making the test fragile across fast GPUs and non-defaultHIP_LAUNCH_BLOCKINGsettings.Technical Details
projects/hip-tests/catch/unit/device/hipDeviceSynchronize.ccREQUIRE(NUM_ITERS != A[NUM_STREAMS - 1][0] - 1)pre-sync assertion and its associated workaround comment.hipStreamQueryloop that checks forhipErrorNotReady— confirms at least one stream still has pending work without touching any shared host buffer:REQUIRE(NUM_ITERS == A[NUM_STREAMS - 1][0] - 1)) is unchanged.JIRA ID
ROCM-20519
Test Plan
Unit_hipDeviceSynchronize_Functional— verify the test passes and no longer requires oversized copies to mask timing races.Test Result
No regressions observed in the modified test case.
Submission Checklist
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.